Learning transformed product distributions

نویسندگان

  • Constantinos Daskalakis
  • Ilias Diakonikolas
  • Rocco A. Servedio
چکیده

We consider the problem of learning an unknown product distribution X over {0, 1}n using samples f (X) where f is a known transformation function. Each choice of a transformation function f specifies a learning problem in this framework. Information-theoretic arguments show that for every transformation function f the corresponding learning problem can be solved to accuracy ǫ, using Õ(n/ǫ2) examples, by a generic algorithm whose running time may be exponential in n. We show that this learning problem can be computationally intractable even for constant ǫ and rather simple transformation functions. Moreover, the above sample complexity bound is nearly optimal for the general problem, as we give a simple explicit linear transformation function f (x) = w · x with integer weights wi ≤ n and prove that the corresponding learning problem requires Ω(n) samples. As our main positive result we give a highly efficient algorithm for learning a sum of independent unknown Bernoulli random variables, corresponding to the transformation function f (x) = ∑n i=1 xi. Our algorithm learns to ǫ-accuracy in poly(n) time, using a surprising poly(1/ǫ) number of samples that is independent of n. We also give an efficient algorithm that uses log n · poly(1/ǫ) samples but has running time that is only poly(log n, 1/ǫ).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Minimax Estimation of the Scale Parameter in a Family of Transformed Chi-Square Distributions under Asymmetric Squared Log Error and MLINEX Loss Functions

This paper is concerned with the problem of finding the minimax estimators of the scale parameter ? in a family of transformed chi-square distributions, under asymmetric squared log error (SLE) and modified linear exponential (MLINEX) loss functions, using the Lehmann Theorem [2]. Also we show that the results of Podder et al. [4] for Pareto distribution are a special case of our results for th...

متن کامل

Learning Mixtures of Product Distributions Using Correlations and Independence

We study the problem of learning mixtures of distributions, a natural formalization of clustering. A mixture of distributions is a collection of distributions D = {D1, . . .DT }, and mixing weights, {w1, . . . , wT } such that

متن کامل

Learning Mixtures of Discrete Product Distributions using Spectral Decompositions

We study the problem of learning a distribution from samples, when the underlying distribution is a mixture of product distributions over discrete domains. This problem is motivated by several practical applications such as crowdsourcing, recommendation systems, and learning Boolean functions. The existing solutions either heavily rely on the fact that the number of mixtures is finite or have s...

متن کامل

Activized Learning: Transforming Passive to Active with Improved Label Complexity

We study the theoretical advantages of active learning over passive learning. Specifically, we prove that, in noise-free classifier learning for VC classes, any passive learning algorithm can be transformed into an active learning algorithm with asymptotically strictly superior label complexity for all nontrivial target functions and distributions. We further provide a general characterization ...

متن کامل

Activized Learning: Transforming Passive to Active with Improved Label Complexity∗ Working Notes: Updated January 2011

We study the theoretical advantages of active learning over passive learning. Specifically, we prove that, in noise-free classifier learning for VC classes, any passive learning algorithm can be transformed into an active learning algorithm with asymptotically strictly superior label complexity for all nontrivial target functions and distributions, in many cases without significant loss in comp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1103.0598  شماره 

صفحات  -

تاریخ انتشار 2011